Bioinformatics (Thomas Dandekar, Meik Kunz)

newly produced DNA radioactively, but also mixed dideoxy adenine triphosphate with the

normal deoxy adenine triphosphate, so that the enzyme always stutters at the adenine and

breaks off with about 1% probability at each adenine. This way, you can then visualize all

the adenines in the sequence after sorting the radiolabeled fragments by size and putting

on a film. If I use other dideoxy nucleotides, I also read the other nucleotides. I can also

replace the radioactivity with nucleotides of different luminosity and use a laser to deter

mine the nucleotides online. All this led to the fact that one could determine the DNA

sequences ever faster, in order to store the sequence flood finally in large computer data

bases. After the sequencing reaction and the separation of the fragments had been minia

turised further and further, the sequencing speed increased further and further so that it is

now possible to read many millions of nucleotides per track and process many tracks

simultaneously. In order to determine the genome sequence, the DNA of an organism is

first chopped up (“shotgun” method) and then all these small pieces are sequenced simul

taneously at lightning speed. However, this makes another task more and more difficult,

namely to put the many sequence snippets together in the right way, i.e. to determine the

genome sequence correctly from the snippets found by putting them together (“mapping”

and “assembly” of the genome sequence). In particular, regions in which sequences are

repeated again and again (repeat regions) are difficult to represent correctly in terms of

their length and number of repeats.

3.1

For the other parts of the genome sequence, which do not reveal their function so easily

by high similarity, one has to analyse them in more detail. Here, machine learning and

artificial intelligence methods (Chap. 14) help to understand the sequence. For example,

2008

3 Genomes: Molecular Maps of Living Organisms